Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
medRxiv ; 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38405784

RESUMO

Importance: Large language models (LLMs) are crucial for medical tasks. Ensuring their reliability is vital to avoid false results. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. Objective: Evaluate ChatGPT and LlaMA-2 performance in extracting MMSE and CDR scores, including their associated dates. Methods: Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. Results: For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. Conclusions: In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.

2.
Mol Cell Proteomics ; 15(3): 1060-71, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26631509

RESUMO

Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations, and splice variants identified in cancer cells are translated. Herein, we apply a proteogenomic data integration tool (QUILTS) to illustrate protein variant discovery using whole genome, whole transcriptome, and global proteome datasets generated from a pair of luminal and basal-like breast-cancer-patient-derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS sample process replicates defined here as an independent tandem MS experiment using identical sample material. Despite analysis of over 30 sample process replicates, only about 10% of SNVs (somatic and germline) detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNVs without a detectable mRNA transcript were also observed, suggesting that transcriptome coverage was incomplete (∼80%). In contrast to germline variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than in the luminal tumor, raising the possibility of differential translation or protein degradation effects. In conclusion, this large-scale proteogenomic integration allowed us to determine the degree to which mutations are translated and identify gaps in sequence coverage, thereby benchmarking current technology and progress toward whole cancer proteome and transcriptome analysis.


Assuntos
Processamento Alternativo , Neoplasias Mamárias Experimentais/genética , Mutação , Proteômica/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos , Animais , Biologia Computacional/métodos , Bases de Dados Genéticas , Feminino , Genoma , Humanos , Neoplasias Mamárias Experimentais/metabolismo , Camundongos , Polimorfismo de Nucleotídeo Único , Espectrometria de Massas em Tandem , Transcriptoma
3.
OMICS ; 17(2): 94-105, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23289783

RESUMO

Peptide and protein identification via tandem mass spectrometry (MS/MS) lies at the heart of proteomic characterization of biological samples. Several algorithms are able to search, score, and assign peptides to large MS/MS datasets. Most popular methods, however, underutilize the intensity information available in the tandem mass spectrum due to the complex nature of the peptide fragmentation process, thus contributing to loss of potential identifications. We present a novel probabilistic scoring algorithm called Context-Sensitive Peptide Identification (CSPI) based on highly flexible Input-Output Hidden Markov Models (IO-HMM) that capture the influence of peptide physicochemical properties on their observed MS/MS spectra. We use several local and global properties of peptides and their fragment ions from literature. Comparison with two popular algorithms, Crux (re-implementation of SEQUEST) and X!Tandem, on multiple datasets of varying complexity, shows that peptide identification scores from our models are able to achieve greater discrimination between true and false peptides, identifying up to ∼25% more peptides at a False Discovery Rate (FDR) of 1%. We evaluated two alternative normalization schemes for fragment ion-intensities, a global rank-based and a local window-based. Our results indicate the importance of appropriate normalization methods for learning superior models. Further, combining our scores with Crux using a state-of-the-art procedure, Percolator, we demonstrate the utility of using scoring features from intensity-based models, identifying ∼4-8 % additional identifications over Percolator at 1% FDR. IO-HMMs offer a scalable and flexible framework with several modeling choices to learn complex patterns embedded in MS/MS data.


Assuntos
Cadeias de Markov , Peptídeos/análise , Espectrometria de Massas em Tandem , Algoritmos , Bases de Dados de Proteínas , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Software
4.
Int Conf Collab Comput ; 2012: 591-596, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-25309967

RESUMO

Mass-spectrometry (MS) based proteomics has become a key enabling technology for the systems approach to biology, providing insights into the protein complement of an organism. Bioinformatics analyses play a critical role in interpretation of large, and often replicated, MS datasets generated across laboratories and institutions. A significant amount of computational effort in the workflow is spent on the identification of protein and peptide components of complex biological samples, and consists of a series of steps relying on large database searches and intricate scoring algorithms. In this work, we share our efforts and experience in efficient handling of these large MS datasets through database indexing and parallelization based on multiprocessor architectures. We also identify important challenges and opportunities that are relevant specifically to the task of peptide and protein identification, and more generally to similar multi-step problems that are inherently parallelizable.

5.
J Thorac Oncol ; 6(4): 725-34, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21304412

RESUMO

INTRODUCTION: Lung cancer remains the leading cause of cancer-related death with poor survival due to the late stage at which lung cancer is typically diagnosed. Given the clinical burden from lung cancer and the relatively favorable survival associated with early-stage lung cancer, biomarkers for early detection of lung cancer are of important potential clinical benefit. METHODS: We performed a global lung cancer serum biomarker discovery study using liquid chromatography-tandem mass spectrometry in a set of pooled non-small cell lung cancer case sera and matched controls. Immunoaffinity subtraction was used to deplete the top most abundant serum proteins; the remaining serum proteins were subjected to trypsin digestion and analyzed in triplicate by liquid chromatography-tandem mass spectrometry. The tandem mass spectrum data were searched against the human proteome database, and the resultant spectral counting data were used to estimate the relative abundance of proteins across the case/control serum pools. The spectral counting-derived abundances of some candidate biomarker proteins were confirmed with multiple reaction monitoring mass spectrometry assays. RESULTS: A list of 49 differentially abundant candidate proteins was compiled by applying a negative binomial regression model to the spectral counting data (p < 0.01). Functional analysis with Ingenuity Pathway Analysis tools showed significant enrichment of inflammatory response proteins, key molecules in cell-cell signaling and interaction network, and differential physiological responses for the two common non-small cell lung cancer subtypes. CONCLUSIONS: We identified a set of candidate serum biomarkers with statistically significant differential abundance across the lung cancer case/control pools, which, when validated, could improve lung cancer early detection.


Assuntos
Adenocarcinoma/sangue , Biomarcadores Tumorais/sangue , Proteínas Sanguíneas/análise , Carcinoma Pulmonar de Células não Pequenas/sangue , Carcinoma de Células Escamosas/sangue , Neoplasias Pulmonares/sangue , Adenocarcinoma/diagnóstico , Idoso , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma de Células Escamosas/diagnóstico , Estudos de Casos e Controles , Cromatografia Líquida , Feminino , Humanos , Pulmão/metabolismo , Neoplasias Pulmonares/diagnóstico , Masculino , Pessoa de Meia-Idade , Prognóstico , Espectrometria de Massas em Tandem
6.
Disasters ; 34(3): 705-31, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20298262

RESUMO

This paper offers a potential measurement solution for assessing disaster impacts and subsequent recovery at the household level by using a modified domestic assets index (MDAI) approach. Assessment of the utility of the domestic assets index first proposed by Bates, Killian and Peacock (1984) has been confined to earthquake areas in the Americas and southern Europe. This paper modifies and extends the approach to the Indian sub-continent and to coastal surge hazards utilizing data collected from 1,000 households impacted by the Indian Ocean tsunami (2004) in the Nagapattinam district of south-eastern India. The analyses suggest that the MDAI scale is a reliable and valid measure of household living conditions and is useful in assessing disaster impacts and tracking recovery efforts over time. It can facilitate longitudinal studies, encourage cross-cultural, cross-national comparisons of disaster impacts and inform national and international donors of the itemized monetary losses from disasters at the household level.


Assuntos
Cultura , Características da Família , Tsunamis/economia , Análise de Variância , Feminino , Humanos , Índia , Oceano Índico , Masculino , Modelos Econométricos , Organizações , Reprodutibilidade dos Testes , Medição de Risco/métodos , Estatística como Assunto
7.
AMIA Annu Symp Proc ; : 445-9, 2008 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-18999186

RESUMO

Discretization acts as a variable selection method in addition to transforming the continuous values of the variable to discrete ones. Machine learning algorithms such as Support Vector Machines and Random Forests have been used for classification in high-dimensional genomic and proteomic data due to their robustness to the dimensionality of the data. We show that discretization can help improve significantly the classification performance of these algorithms as well as algorithms like Naïve Bayes that are sensitive to the dimensionality of the data.


Assuntos
Algoritmos , Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Técnicas de Apoio para a Decisão , Reconhecimento Automatizado de Padrão/métodos
8.
AMIA Annu Symp Proc ; : 1033, 2008 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-18999243

RESUMO

The Empirical Proteomic Ontology Knowledge Base (EPO-KB) is an online database that represents current knowledge of biomarkers and contains associations between mass-to-charge (m/z) ratios of mass-spectrometry peaks to proteins. Such a database is a useful tool for identifying putative proteins associated with a m/z ratio. At present, EPO-KB contains data that have been extracted from 120 published research papers. It has been used in successful identification of a protein associated with a biomarker.


Assuntos
Biomarcadores/química , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Mapeamento de Peptídeos/métodos , Proteoma/química , Proteoma/classificação , Interface Usuário-Computador
9.
Disasters ; 32(4): 537-60, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18435768

RESUMO

Studies on the impacts of hurricanes, tropical storms, and tornados indicate that poor communities of colour suffer disproportionately in human death and injury.(2) Few quantitative studies have been conducted on the degree to which flood events affect socially vulnerable populations. We address this research void by analysing 832 countywide flood events in Texas from 1997-2001. Specifically, we examine whether geographic localities characterised by high percentages of socially vulnerable populations experience significantly more casualties due to flood events, adjusting for characteristics of the natural and built environment. Zero-inflated negative binomial regression models indicate that the odds of a flood casualty increase with the level of precipitation on the day of a flood event, flood duration, property damage caused by the flood, population density, and the presence of socially vulnerable populations. Odds decrease with the number of dams, the level of precipitation on the day before a recorded flood event, and the extent to which localities have enacted flood mitigation strategies. The study concludes with comments on hazard-resilient communities and protection of casualty-prone populations.


Assuntos
Planejamento em Desastres , Planejamento Ambiental , Inundações/mortalidade , Desastres/estatística & dados numéricos , Inundações/estatística & dados numéricos , Geografia , Humanos , Modelos Estatísticos , Medição de Risco , Texas
10.
Disasters ; 32(1): 1-18, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18217915

RESUMO

Floods continue to pose the greatest threat to the property and safety of human communities among all natural hazards in the United States. This study examines the relationship between the built environment and flood impacts in Texas, which consistently sustains the most damage from flooding of any other state in the country. Specifically, we calculate property damage resulting from 423 flood events between 1997 and 2001 at the county level. We identify the effect of several built environment measures, including wetland alteration, impervious surface, and dams on reported property damage while controlling for biophysical and socio-economic characteristics. Statistical results suggest that naturally occurring wetlands play a particularly important role in mitigating flood damage. These findings provide guidance to planners and flood managers on how to alleviate most effectively the costly impacts of foods at the community level.


Assuntos
Desastres , Planejamento Ambiental , Áreas Alagadas , Demografia , Geografia , Humanos , Projetos Piloto , Pobreza , Análise de Regressão , Fatores de Risco , Fatores Socioeconômicos , Texas
11.
Environ Manage ; 38(4): 597-617, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16933080

RESUMO

Recent interest in expanding offshore oil production within waters of the United States has been met with opposition by groups concerned with recreational, environmental, and aesthetic values associated with the coastal zone. Although the proposition of new oil platforms off the coast has generated conflict over how coastal resources should be utilized, little research has been conducted on where these user conflicts might be most intense and which sites might be most suitable for locating oil production facilities in light of the multiple, and often times, competing interests. In this article, we develop a multiple-criteria spatial decision support tool that identifies the potential degree of conflict associated with oil and gas production activities for existing lease tracts in the coastal margin of Texas. We use geographic information systems to measure and map a range of potentially competing representative values impacted by establishing energy extraction infrastructure and then spatially identify which leased tracts are the least contentious sites for oil and gas production in Texas state waters. Visual and statistical results indicate that oil and gas lease blocks within the study area vary in their potential to generate conflict among multiple stakeholders.


Assuntos
Conflito Psicológico , Conservação dos Recursos Naturais , Sistemas de Informação Geográfica , Sedimentos Geológicos/análise , Hidrocarbonetos/análise , Tomada de Decisões , Ecossistema , Monitoramento Ambiental , Petróleo , Texas , Poluentes da Água/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...